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Abstract —We consider the problem of partially recovering 
hidden binary variables from the observation of (few) censored 
edge weights, a problem with applications in community detec¬ 
tion, correlation clustering and synchronization. We describe two 
spectral algorithms for this task based on the non-backtracking 
and the Bethe Hessian operators. These algorithms are shown 
to be asymptotically optimal for the partial recovery problem, in 
that they detect the hidden assignment as soon as it is information 
theoretically possible to do so. 


A. Introduction 

In many inference problems, the available data can be 
represented on a weighted graph. Given the knowledge of 
the edge weights, the task is to infer latent variables carried 
by the nodes. Here, we shall consider the problem of recov¬ 
ering binary node labels from censored edge measurements 
0. 0- Specifically, given an Erdos-Renyi random graph 
G = (V,E) £ Q(n,a/n) with n nodes carrying latent 
variables ay = ±1. 1 < * < n, we draw the edge labels 
Jij =±1, (ij ) £ E from the following distribution: 

P(J ij \<Ji,a j ) = (l-e)l(Ji j = a i a j )+el(J i j = -oyer ,-), (1) 

where e is a noise parameter. In the noiseless case e = 0, we 
have Oitjj = Jij and one can easily recover the communities 
in each connected component along a spanning tree. When 
e = 1/2, on the other hand, the graph doesn’t contain any 
information about the latent variables ay, and recovery is 
impossible. What happens in between? The problem of exactly 
recovering the latent variables cr, has been studied in 0. 
It turns out that, asymptotically in the large n limit, exact 
recovery is shown to be possible if and only if 
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where a is the average degree of the graph. Note that the 
variable of an isolated vertex cannot be recovered so that the 
average degree has to grow at least like log n, as in the Coupon 
collector’s problem, to ensure that the graph is connected. 

We consider in this paper the case where the average degree 
a will remain fixed as n tends to infinity. In this setting, we 
cannot ask for exact recovery and we consider here a different 
question: is it possible to infer an assignment ay of the latent 
variables that is positively correlated with the planted variables 


ay? We call positively correlated an assignment ay such that 
the following quantity, called overlap , is strictly positive: 



In the limit n -A oo, this overlap vanishes for a random 
guess ay, and is equal to unity if the recovery is exact. We will 
refer to the task of finding a positively correlated assignment 
ay as partial recovery. This task has been shown 0,0 to be 
possible only if 

Oi > ttdetect = n 9 • (4) 

(1 — 2e)“ 

To the best of our knowledge, there is no rigorous proof 
that this bound is also sufficient. In 0, the same authors 
also showed that belief propagation (BP) allows to saturate 
this bound. However, there is no rigorous analysis of BP for 
this problem and the fact that condition (|4| is necessary and 
sufficient was left as a conjecture in (]3j and only the necessary 
part was proved in |4|. Moreover, from a practical point of 
view, BP requires the knowledge of the noise parameter e. 

In this contribution, we describe two simple spectral al¬ 
gorithms and we show rigorously that they are optimal, in 
the sense that they can perform partial recovery as soon as 
a > «detect- Additionally, the output of these algorithms is 
shown numerically to have an overlap similar to that of BP, 
without requiring the knowledge of the noise parameter e. 
This closes the gap from 0, 0- where spectral methods are 
introduced that succeed only if the connectivity is significantly 
larger than the threshold ([4]). The resulting algorithms are thus 
fast, trivial to implement, and asymptotically optimal. 

B. Motivation and Related work 

There are various interpretations and models that connect 
to this problem such as i) Community detection Q: we try 
to recover the community membership of the nodes based on 
noisy (or censored) observations about their relationship; ii) 
Correlation clustering |5J: we try to cluster the graph G by 
minimizing the number of “disagreeing edges” (Jij = — 1) in 
each cluster. These examples, and others such as synchronisa¬ 
tion, are discussed in details in 111. 

The inspiration for the present contribution comes from 
recent developments in the problem of detecting communities 






in the (sparse) stochastic block model. The threshold for partial 
recovery in the stochastic block model was conjectured in [6] 
and proved in 0-@. Optimal spectral methods, based on 
the same operators as the algorithms introduced here, were 
proposed in ® ED- These operators were in particular 
shown to be much better suited to very sparse graphs than 
the traditional adjacency or Laplacian operators. 
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Interestingly, this problem first appeared in statistical 
hysics. Indeed, the posterior distribution corresponding to eq. 


1 1 reads, using /3 0 = ^ log 


P{a\J) = 


A) JijGiGj 

g («)e-E 


(5) 


This is nothing but the spin glass 03 problem where the 
couplings Jij are correlated with the ’’planted” configura¬ 
tion a i 03- Such problems can also be shown to be 

equivalent to spin glasses on the so-called Nishimori line 
03- 03- With these notations, the detection condition 0 
corresponds to the well-known spin glass transition 1161, [17 
at ctdetect tanli /?o = 1. In this spin glass context, [ 18 
already conjectured that a spectral algorithm based on the non¬ 
backtracking operator (see sec. |I-A|> was optimal. 


(A 7 ^ ±l,t> € R 2m ) is an eigenpair of B, then (A, v' G R 2n ) 
is an eigenpair of B' if 

v' n+i = VI < i < n, (8) 

iedi 

A vl = (<k - l)v' n+i , (9) 

where di and d, are the set of neighbors and the degree of 
node i. We will therefore favor using B'. The algorithm is 
then as follows: given a graph with edge weights J,j , 

Algorithm 1 

1) build the matrix B' 

2 ) compute its leading eigenvalue Ai (with largest mag¬ 
nitude), and its corresponding eigenvector v' = {v [}. 

3) if Ai £ K and Ai > yfa ,, where a is the average 
degree of the graph, set Xi — sign(u^ +i ). Otherwise, 
raise an error. 

Theorem 1 ensures that whenever 0 holds, this algorithm 
outputs an assignment x % that is positively correlated with the 
planted latent variables Xi. 

B. The Bethe Hessian 


C. Outline and main results 

In section [TJ we describe two spectral algorithms that 
achieve the threshold 0. These algorithms are based on two 
linear operators: the non-backtracking operator introduced in 
CD> and the Bethe Hessian introduced in GD- We further 
illustrate their properties by showing the results of numerical 
experiments. In section [D] we list the spectral properties of 
the non-backtracking operator that are relevant to the present 
context. Finally, we discuss the properties of the Bethe Hessian 
and its relation with the non-backtracking operator in section 
m and discuss its connection with the Bethe free energy. 

I. Spectral algorithms 


Another operator closely related to the non-backtracking 
operator was introduced in [ 11 ]. This operator, called the Bethe 
Hessian, is an n x n real and symmetric matrix defined as 

H = (a-1)1-s/aJ + D, (10) 

where D is the diagonal matrix of vertex degrees. Based on this 
operator, we propose the following algorithm: given a graph 
with edge weights J,, 7 , 

Algorithm 2 

1) build the Bethe Hessian H 

2) compute its (algebraically) smallest eigenvalue A, and 
its corresponding eigenvector v. 

3) if A < 0, set Xi = sign(i;, ( ). Otherwise, raise an error. 


A. The non-backtracking operator 

The non-backtracking operator acts on the directed edges 
i j of the graph as 


B i-tj'k-it = Jktl(j = k)l(i ^ i). (6) 

It is therefore represented by a 2m x 2 to matrix, where m is 
the number of edges in the graph. As discussed in flO) , m 
the motivation for using this operator is that it corresponds to 
the linear approximation of belief propagation for this problem 
around the so-called uninformative fixed point of BP. 

Similarly to one can show (see Sec. m for details) 
that the eigenvalues of B that are different from ±1 form the 
spectrum of the simpler 2 n x 2 n matrix 

B '=(-°i D 7’)- ,7) 

where 1 is the nxn identity matrix, D is the diagonal matrix 
defined by Du = di, where di is the degree of node i, and 
J has entries equal to the edge weights J,; 7 . Furthermore, if 


Justifications for this second algorithm, and its relation with the 
first one, will be provided in section III Compared to the first 
algorithm, this second one is based on a smaller, symmetric 
matrix, which leads to improved numerical performance and 
stability. Additionally, in the case of more general edge weights 
Jij / ± 1, the reduction of B to a smaller matrix B' fails, and 
one has to work with a 2 m x 2 m matrix. The Bethe Hessian, on 
the other hand, generalizes easily to arbitrary weights without 
any loss in scalability ED- 


C. Numerical results 

Before turning to proofs, we show on figure[T]the numerical 
performance of our two algorithms, and compare them with the 
performance of belief propagation ( Q’ED) which is believed 
to be optimal on such locally tree-like graphs in the sense 
that it gives, arguably, the Bayes optimal value of the overlap 
asymptotically. As shown in section [II] both algorithms 1 and 
2 are able to achieve partial recovery as soon as a > uyet.ect, 
and their overlap is similar to that of BP, though of course 
strictly smaller. Note again that BP requires the knowledge of 
e while the two spectral algorithms described here do not, are 











Fig. 1. Overlap as a function of a: comparison between algorithm 1 (based 
on the non-backtracking operator B), algorithm 2 (based on the Bethe Hessian 
H), and belief propagation (BP). The noise parameter e is fixed to 0.25 
(corresponding to ctdetect = 4), and we vary a. The overlap for B and H is 
averaged over 20 graphs of size n = 10 5 . The overlap for BP is estimated 
asymptotically using the standard method of population dynamics (see for 
instance |20) ), with a population of size 10 4 . All three methods output a 
positively correlated assignment as soon as a > et ect . Spectral algorithms 
1 and 2 have an overlap similar to that of BP, with the same phase transition, 
while being simpler and not requiring the knowledge of the parameter e. 


trivial to implement, run faster, and avoid the potential non¬ 
convergence problem of belief propagation while remaining 
asymptotically optimal in detecting the hidden assignment. We 
also observe, empirically, that the overlap given by the Bethe 
Hessian seems to be always superior to the one provided by 
the non-backtracking operator. 


II. Spectral properties of the non-backtracking 

OPERATOR 


In this section, we state results concerning the spectrum of 
B and show that algorithm 1 outputs an assignment bi that is 
positively correlated with the planted one, whenever ([4]) holds. 


As already noticed in previous work for the case of an 
unweighted random graph fTO) , the superior performance 
of the non-backtracking operator B is due to the particular 
shape of its spectrum. In the case of the stochastic block model 
[ 22 ], it decomposes into a bulk of uninformative eigenvalues 
contained in a disk of radius y/ot in the complex plane, and 
a few real and informative eigenvalues outside of the disk. 
This observation was recently proven in 1231, in the case of 2 
communities. 


The following theorem generalizes this previous result to 
the present setting and is the main result of this paper. 

Theorem 1: Given an Erdos-Renyi random graph with 
average degree a, variables assigned to vertices er.; = ±1 
uniformly at random independently from the graph and where 
the edges carry weights sampled from 0 . we denote by B 
the non-backtracking operator defined by ([ 6 ]). and by |Ai| > 
IA 2 1 > • • ■ > | A 2 m I the eigenvalues of B in order of decreasing 


magnitude. Then, with probability tending to 1 as n — > 00, we 
have: 

(i) if a < Ctdetect then |Ai| < y/a + o(l). 

(ii) if a > adetect* then Ai £ M, Ai = a(l — 2 e)+o(l) > 

y/a, and |A 2 1 < yfot + o(l). Additionally, denoting 
v the eigenvector associated with Ai, the following 
assignment is positively correlated with the planted 
variables a: 


bi = sign ^2 Vj 
\jedi 

This theorem is illustrated on Fig. [2] 

It is then straightforward to show the following: 

Corollary 1: The assignment output by Algo. 1 is posi¬ 
tively correlated with the planted variables or if and only if 

OL Ctdetect • (II) 



We now give a brief sketch of proof for our Theorem[I] The 
proof relies heavily on the techniques developed in ]23[ . We 
try to use notation consistent with 123]: E is the set of oriented 
edges and for any e = u —► v = (u, v) £ E, we set e\ = u, 
e 2 — v and e -1 = (v,u). For a matrice M, its transpose is 
denoted by M*. We start with a simple observation: if t is 
the vector in R E defined by t e = a e2 and 0 is the Hadamard 
product, i.e. (t 0 x) e = a e2 x e , then we have 


Bx = Ax O B(t © x) = A(t © x), 


( 12 ) 


with B defined by B e f = B,,ffTf tjf 2 . In particular, B an B have 
the same spectrum and there is a trivial relation between their 
eigenvectors. It will be easier to work with B so to lighten the 
notation, we will denote (in this section): 


B e/ = l(e 2 = /i)l( ei ^ f 2 )P f , 

where Pf = a f x Jyo / 2 . Note that the random variables Pf are 
now i.i.d. with P (Pf = 1) = 1 — P (Pf = —1) = 1 — e. With 
this formulation, the problem is said in statistical physics to 
be ”on the Nishimori line” 03, G3- 

For the case (1 — 2e) 2 a < 1, the proof is relatively easy. 
Indeed, from ( 3 , we know that our setting is contiguous to the 
setting with e = 1/2. In this case, the random variable P, :l 
are centered and a version of the trace method will allow to 
upper bound the spectral radius of B. Note however, that one 
needs to condition on the graph to be f-tangle-free, i.e. such 
that every neighborhood of radius i contains at most one cycle 
in order to apply the first moment method. 

We now consider the case (1 — 2e) 2 a > 1 and denote by 
P the linear mapping on defined by ( Px) e = P e x e -i 
(i.e. the matrix associated to P is = P e l(/ = e -1 )). 
Note that P* = P and since P 2 = 1, P is an involution so 
that P is an orthogonal matrix. A simple computation shows 
that B k P = PB* k , hence B k P is a symmetric matrix. This 
symmetry corresponds to the oriented path symmetry in [|23] 
and will be crucial to our analysis. 

We also define a = (1 — 2e)a and x G with Xe = 1 
for all e £ E. The proof strategy is then similar to Section 







5 in (23) . Consider a sequence £ ~ n\og & n for some small 
positive k. Let 


B l X 

* ll^xll 


e = \\B e p<p\\., c 


B e Pip 

e 


If R = B e — 9(Pip* and we can prove that ||f?|| is small in 
comparison with 9, then we can use a theorem on perturbation 
of eigenvalues and eigenvectors adapted from the Bauer-Fike 
theorem (see Section 4 in J23j ) saying that B f should have an 
eigenvalue close to 9. 

More precisely, for y £ ]& E with ||y|| = 1, write y = 
sPip + x with x £ (P<fi) ± and s £ R. Then, we find 

\\Ry\\ = \\B l x + s(B e Pip~9()\\< sup \\B e x\\. 

x:(:r,Pc£>)=0, ||:r ||=1 


This last quantity can be shown to be upper bounded by 
(log n) c c/' 2 similarly as in Proposition 12 in | |23) . Moreover, 
we can also show that w.h.p. 

(C, P<p) > Co, C 0 a e <9 < Cl a 1 . (13) 


These bounds allow to show that B has an eigenvalue |Ai — 
a\ = 0( l/£) and that |A 21 < y/a + o{ 1 ). 


Note that 9 = 


, so that we need to compute 

quantities of the type ||£rx||. We now explain the main ideas 
to compute these quantities. First note that, ( B e \)e depends 
only on the ball of radius £ around the edge e. For £ not too 
large, this neighborhood can be coupled with a Galton-Watson 
branching process with offspring distribution Poi(a). It is 
then natural to consider this Poisson Galton-Watson branching 
process with i.i.d. weights P uv £ {±1} on its edges with 
mean 1 — 2e. For u in the tree, we denote by |tt its generation 
and by Y(u) = JlE P-y.^+i where 7 = (71, • • • ,7 1 ) is the 
unique path between the root o = 71 and u = 7 1 . Then (Trx)e 
is well approximated by: 


= E Y («)■ 

\u\=l 


It is easy to see that X t = ^ is a martingale (with respect 
to the natural filtration) with zero mean. Moreover we have 


E [Z?] 


E 


E Y(u)Y(v) 

u,v:\u\ = \v\=t 


E a‘ _z (l - 2e) 2i a 2i = O ( a 2t ) , 
»=0 


where the last equality is valid only if (1 — 2e) 2 a > 1. So 
in this case, we have E [X 2 ] = 0(1) and the martingale 
X t converges a.s. and in L 2 to a limiting random variable 
X(oo) with mean one. Following the argument as in |23|, this 
reasoning leads to m- 

We now consider the eigenvector associated with Ai. It 
follows from Bauer-Fike theorem (see Section 4 in (23)) that 
the eigenvector x associated to Ai is asymptotically aligned 

tj £ Dy 

with || B e B* e Px\\ ' Thanks to the coupling with the branching 



Fig. 2. Spectrum of the non-backtracking matrix in the complex plane for a 
problem generated with e = 0.25, n = 2000. We used a = 3 (left side) and 
a = 8 (right side), to be compared with ^detect = 4. Each point represents 
an eigenvalue. In both cases, the bulk of the spectrum is confined in a circle 
of radius y/a. However, when a > a-detect* a single isolated eigenvalue 
appears out of the bulk at (1 — 2e)a (see the arrow on the right plot) and the 
corresponding eigenvector is correlated with the planted assignement. 


process, we can prove that \\B e B* i Px\\ ~ ot 21 and moreover, 
we have for e £ E, 


(B e B* e P X )e 

1 


t(l — 2 e ) 2 


—X(oo), 


(14) 


where X(oo) is the limit of the martingale defined above 
and has mean one. We can now translate this result to the 
eigenvector of the original non-backtracking operator thanks 
to (12]): v e = <j e2 x e where x e is approximated by (l4|. In 
particular, we see that eo=v v e is correlated with a v . 


III. From the non-backtracking operator to the 
Bethe Hessian 


In this section, we relate the spectra of H, B and B' by 
generalizing some properties discussed in (To), (TTJ- ^ 
±l,u £ R 2m ) being an eigenpair of B, we define 

Vi = ^^Vj-ii, VI < i < n. (15) 

j£di 


Since Xvi—^j — zL >/ h follows that A Vi—^j — Vi 

Closing the equation on the single site elements v t 
thus leads to 



Vk = 0 . (16) 


For convenience, we now define the matrix: 


H(X) = (X 2 - 1)1 -XJ + D (17) 


Note in particular that the Bethe Hessian reads H = II (,/a:). 
Given that the values of .1 are ±1, all eigenvalues of B dif¬ 
ferent from ±1 thus must satisfies the following generalization 
of the Ihara-Bass formula |24| : 

det [(A 2 - 1)1 -XJ + D] = det H(A) = 0. (18) 

To solve ( p~ 6 | ) one needs to find an eigenvector v of H(A) with 
a zero eigenvalue. This is a quadratic eigenproblem, which 
can be turned into a linear one by introducing the matrix B' 
of Algo. 1. Indeed, if A £ R is an eigenvalue of B' with 




















eigenvector v', then it follows that v := {v'} n +i<i< 2 ra is 
an eigenvector of H(A) with eigenvalue 0, so that A is an 
eigenvalue of B as well (at least if A 7 ^ ±1), justifying eq. 

Note that since we are interested in values of A > 1 
(since A > a and we need a > 1 from Q), the limitation of 
looking at A 7 ^ ±1 is irrelevant. 

Finally, following 0 - we can relate the spectra of B and 
H by the following argument. For X large enough, 11 (A') 
is positive definite. Then as X decreases, H(A') will gain a 
new negative eigenvalue whenever X becomes equal to an 
eigenvalue of B. This justifies the following corollary: 

Corollary 2: if the conditions of Theorem 1 apply, then 
H = H( v / a) has a unique negative eigenvalue if a > oyetect, 
and none otherwise. 

Strictly speaking, if we denote by Ai the leading eigenvalue 
of B, we have only shown that the eigenvector with eigenvalue 
0 of H(Ai) is positively correlated with the planted variables 
if a > ctdetect- However, we observe numerically (see figure 
UJ that the eigenvector with negative eigenvalue of H is also 
positively correlated, and in fact gives a slightly better overlap. 
This point will have to be clarified in future work. 

It is worth noting the Bethe Hessian is also related to 
the belief propagation algorithm. ]25) showed that the fixed 
points of the BP recursion are stationary points of the so- 
called Bethe free energy. Direct optimization of the Bethe free 
energy has then been proposed as an alternative to BP [ |26| . 
In this context, 0 showed that the so-called paramagnetic 
fixed point (corresponding to an uninformative assignment) is 
a local minimum of the Bethe free energy if and only if H is 
positive definite. Algo. 2 can therefore be seen as a spectral 
relaxation of the direct optimization of the Bethe free energy. 
In the end, both approaches are indeed deeply related to BP. 

IV. Conclusion 

We have considered the problem of partially recovering bi¬ 
nary variables from the observation of censored edge weights, 
and described two optimal spectral algorithms for this task 
that can provably perform partial recovery as soon as it is 
information theoretically possible to do so. Remarkably, these 
algorithms do not require the knowledge of the noise parameter 
e and perform almost as well as belief propagation, which is 
expected (but not proved) to be Bayes optimal for this problem. 
This allows to close the gap from previous works, both 
algorithmically, by providing optimal spectral algorithms, and 
theoretically, by proving that the transition <|4} is a necessary 
and sufficient condition for partial recovery. 
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